-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support async ChannelMonitorUpdate
persist for claims against closed channels
#3414
base: main
Are you sure you want to change the base?
Support async ChannelMonitorUpdate
persist for claims against closed channels
#3414
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3414 +/- ##
=======================================
Coverage 89.73% 89.73%
=======================================
Files 130 130
Lines 107793 107860 +67
Branches 107793 107860 +67
=======================================
+ Hits 96727 96793 +66
+ Misses 8662 8656 -6
- Partials 2404 2411 +7 ☔ View full report in Codecov by Sentry. |
8fca379
to
fb56bd3
Compare
d581b8d
to
76b3a99
Compare
Rebased on #3413 after merge. I promise this one isn't quite as bad @jkczyz / @valentinewallace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few things that jumped out at me. Will need to take another pass at this.
let latest_update_id = monitor.get_latest_update_id(); | ||
update.update_id = latest_update_id.saturating_add(1); | ||
let latest_update_id = monitor.get_latest_update_id().saturating_add(1); | ||
update.update_id = latest_update_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this behavior wrong previously? (i.e., using the previous latest_update_id
below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics have changed - because we're setting the value at update creation we need the map to contain the value after this update, rather than the value before this update (which was used to set the update's value when its applied).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take another pass as well but overall looks good
// If the A<->B channel was closed before we reload, we'll replay the claim against it on | ||
// reload, causing the `PaymentForwarded` event to get replayed. | ||
let evs = nodes[1].node.get_and_clear_pending_events(); | ||
assert_eq!(evs.len(), if close_chans_before_reload { 2 } else { 1 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, does this PR make the duplicate-PaymentForwarded
issue worse or is it the same as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhhh, if you're using async monitor updating maybe? It shouldn't change anything for non-async monitor update users, but of course for async monitor update users it should only change things (incl a new spurious event) in cases where what we did before was just unsafe.
76b3a99
to
60653bb
Compare
} | ||
if let Some((_, funding_txo, _, update)) = $shutdown_res_mut.monitor_update.take() { | ||
handle_new_monitor_update!($self, funding_txo, update, $peer_state, | ||
$channel_context, REMAIN_LOCKED_UPDATE_ACTIONS_PROCESSED_LATER); | ||
} | ||
// If there's a possibility that we need to generate further monitor updates for this | ||
// channel, we need to store the last update_id of it. However, we don't want to insert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~10 lines down, it looks like we're setting the update id in closed_channel_monitor_update_ids
to the update_id prior to the below background event update_id, but it seems like after 6489550 the update_id in said map should no longer ignore pending background events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grrr, good catch.
lightning/src/ln/channelmanager.rs
Outdated
@@ -7237,37 +7237,42 @@ where | |||
|
|||
let mut preimage_update = ChannelMonitorUpdate { | |||
update_id: 0, // set in set_closed_chan_next_monitor_update_id | |||
counterparty_node_id: prev_hop.counterparty_node_id, | |||
counterparty_node_id: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly spurious - I think when moving this code around for the 10th time at some time before then I didn't have the counterparty id readily available so just removed it since it doesn't matter much (its just used to keep the ChannelMonitor
's counterparty_node_id
field updated if the ChannelMonitor
was build before that field was added). I went ahead and re-added it thought.
let in_flight_updates = per_peer_state.in_flight_monitor_updates | ||
.entry(*funding_txo) | ||
.or_insert_with(Vec::new); | ||
debug_assert!(!in_flight_updates.iter().any(|upd| upd == update)); | ||
in_flight_updates.push(update.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is missing test coverage here? If I add a panic here a ton of tests fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I had commented the lines out. Probably all tests passed because updates are duplicated in pending background events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, its duplicated there and handle_new_monitor_update
ultimately happily treats it as an actually-new update and inserts it into the queue. I spent a bit trying to figure out how to test this but I don't actually think its externally visible.
let in_flight_updates = per_peer_state.in_flight_monitor_updates | ||
.entry(*funding_txo) | ||
.or_insert_with(Vec::new); | ||
debug_assert!(!in_flight_updates.iter().any(|upd| upd == update)); | ||
in_flight_updates.push(update.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to make this part of the DRYing commit?
60653bb
to
c199059
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM, I'm good with a squash
c199059
to
cd65840
Compare
Squashed:
|
lightning/src/ln/channelmanager.rs
Outdated
// Note that the below is race-y - we set the `update_id` here and then drop the peer_state | ||
// lock before applying the update in `apply_post_close_monitor_update` (or via the | ||
// background events pipeline). During that time, some other update could be created and | ||
// then applied, resultin in `ChannelMonitorUpdate`s being applied out of order and causing | ||
// a panic. | ||
Self::set_closed_chan_next_monitor_update_id(&mut *peer_state, prev_hop.channel_id, &mut preimage_update); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was suppose to be applied to 94d0735?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this fixup (now 7ce4e99) is still applied to the wrong commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, not sure how I did that. The commit applies cleanly one up.
Now that we track the latest `ChannelMonitorUpdate::update_id` for each closed channel in `PeerState::closed_channel_monitor_update_ids`, we should always have a `PeerState` entry for the channel counterparty any time we go to claim an HTLC on a channel, even if its closed. Here we make this a hard assertion as we'll need to access that `PeerState` in the coming commits to track in-flight updates against closed channels.
In c99d3d7 we added a new `apply_post_close_monitor_update` method which takes a `ChannelMonitorUpdate` (possibly) for a channel which has been closed, sets the `update_id` to the right value to keep our updates well-ordered, and then applies it. Setting the `update_id` at application time here is fine - updates don't really have an order after the channel has been closed, they can be applied in any order - and was done for practical reasons as calculating the right `update_id` at generation time takes a bit more work on startup, and was impossible without new assumptions during claim. In the previous commit we added exactly the new assumption we need at claiming (as it's required for the next few commits anyway), so now the only thing stopping us is the extra complexity. In the coming commits, we'll move to tracking post-close `ChannelMonitorUpdate`s as in-flight like any other updates, which requires having an `update_id` at generation-time so that we know what updates are still in-flight. Thus, we go ahead and eat the complexity here, creating `update_id`s when the `ChannelMonitorUpdate`s are generated for closed-channel updates, like we do for channels which are still live. We also ensure that we always insert `ChannelMonitorUpdate`s in the pending updates set when we push the background event, avoiding a race where we push an update as a background event, then while its processing another update finishes and the post-update actions get run.
cd65840
to
8be0b8e
Compare
In d1c340a we added support in `handle_new_monitor_update!` for handling updates without dropping locks. In the coming commits we'll start handling `ChannelMonitorUpdate`s "like normal" for updates against closed channels. Here we set up the first step by adding a new `POST_CHANNEL_CLOSE` variant on `handle_new_monitor_update!` which attempts to handle the `ChannelMonitorUpdate` and handles completion actions if it finishes immediately, just like the pre-close variant.
One of the largest gaps in our async persistence functionality has been preimage (claim) updates to closed channels. Here we finally implement support for this (for updates at runtime). Thanks to all the work we've built up over the past many commits, this is a well-contained patch within `claim_mpp_part`, pushing the generated `ChannelMonitorUpdate`s through the same pipeline we use for open channels. Sadly we can't use the `handle_new_monitor_update` macro wholesale as it handles the `Channel` resumption as well which we don't do here.
On startup, we walk the preimages and payment HTLC sets on all our `ChannelMonitor`s, re-claiming all payments which we recently claimed. This ensures all HTLCs in any claimed payments are claimed across all channels. In doing so, we expect to see the same payment multiple times, after all it may have been received as multiple HTLCs across multiple channels. In such cases, there's no reason to redundantly claim the same set of HTLCs again and again. In the current code, doing so may lead to redundant `PaymentClaimed` events, and in a coming commit will instead cause an assertion failure.
One of the largest gaps in our async persistence functionality has been preimage (claim) updates to closed channels. Here we finally implement support for this (for updates which are generated during startup). Thanks to all the work we've built up over the past many commits, this is a fairly straightforward patch, removing the immediate-completion logic from `claim_mpp_part` and adding the required in-flight tracking logic to `apply_post_close_monitor_update`. Like in the during-runtime case in the previous commit, we sadly can't use the `handle_new_monitor_update` macro wholesale as it handles the `Channel` resumption as well which we don't do here.
This moves the common `if during_startup { push background event } else { apply ChannelMonitorUpdate }` pattern by simply inlining it in `handle_new_monitor_update`.
8be0b8e
to
5d5971c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Feel free to squash, IMO.
Based on #3413, this finally implements asyncChannelMonitorUpdate
persistence against closed channels. After all the prep, its...not all that bad. Would still like to write an additional test for the last second-to-commit though the changes in that commit do show it has some coverage.